Take-home Exercise 03

Author

Leow Xian Zu

Published

October 28, 2024

Modified

November 8, 2024

pacman::p_load(
  sf,          # For handling spatial data and geometries
  tidyverse,   # Collection of packages for data manipulation and visualization
  tmap,        # For creating thematic maps and spatial visualizations
  knitr,       # For dynamic report generation and R markdown documents
  kableExtra,  # For creating nice-looking tables in R markdown
  janitor,     # For cleaning and examining data
  skimr,       # For providing summary statistics about variables in data frames
  stringdist,  # For string similarity
  ggstatsplot, # For EDA
  spdep,       # For spatial autocorrelation
  GWmodel,     # For for regression
  corrplot     # For for correlation analysis
)

This take-home exercise examines the geography of financial inclusion through geographically weighted regression (GWR) to identify and analyse factors influencing access to financial services. Using the FinScope 2023 dataset for Tanzania, this study will focus on district-level insights, offering a spatial perspective on financial inclusion determinants. The exercise will involve geospatial data wrangling, model diagnostics, and geovisualisation, adhering to the grading criteria for data handling, analytical rigour, effective visual communication, and reproducibility in a Quarto environment. The research is grounded in existing literature on financial inclusion, specifically insights from Tanzania, where financial inclusion plays a pivotal role in economic empowerment and reducing income inequality. The goal is to provide a clear, data-driven understanding of spatial accessibility to financial services, with the results aimed at informing policies for broader economic inclusivity.

The research paper on financial inclusion in Tanzania offers valuable insights into the determinants, barriers, and impacts of financial inclusion, highlighting the role of mobile banking and formal financial services in improving economic well-being. It identifies education and income as key determinants and recognises geographic constraints—such as distance to financial institutions—as significant barriers to access. This aligns closely with the objectives of the take-home exercise, which seeks to model the spatial aspects of financial inclusion at the district level. By employing geographically weighted regression (GWR), this exercise will expand upon the research findings by focusing specifically on spatial variability in financial access. The geographic emphasis in this study offers a nuanced understanding of how location influences financial inclusion, which could lead to targeted interventions to overcome geographic barriers and support underserved regions, thereby complementing the study’s broader socio-economic conclusions.

Understanding Tanzania: A Contextual Introduction

As a Singaporean student analysing financial inclusion in Tanzania, I find it crucial to first understand the country’s unique characteristics. While Singapore and Tanzania might seem vastly different, both share a British colonial history and gained independence in the 1960s. However, their development paths have diverged significantly. Here’s a comprehensive overview of Tanzania that will help frame my analysis:

Why This Context Matters

Before diving into financial inclusion statistics and analysis, understanding Tanzania’s geography, economy, and demographics is essential because:

1. Physical geography influences access to services

2. Economic activities affect financial needs

3. Population distribution impacts service delivery

4. Infrastructure development determines financial service reach

Understanding these features also helps me:

1. Identify potential barriers to financial inclusion

2. Understand regional variations in service access

3. Appreciate the role of mobile money in overcoming geographical challenges

4. Recognise why different regions might need different financial solutions

Coming from Singapore’s context of universal banking access and high technological adoption, this understanding helps me approach the analysis with appropriate context and avoid making assumptions based on my Singaporean experience. The vast differences in geography, population distribution, and economic activities between Tanzania and Singapore highlight why financial inclusion solutions that work in Singapore might not be directly applicable to Tanzania.

Key Comparisons with Singapore

comparison_df <- data.frame(
  Aspect = c("Land Area", "Population", "GDP per capita", 
             "Urbanisation", "Main Economic Sectors", "Capital City"),
  Tanzania = c("945,087 km²", "~61 million", "~$1,300 USD",
               "35.2% urban", "Agriculture, Mining, Tourism", 
               "Dodoma (official), Dar es Salaam (de facto)"),
  Singapore = c("728 km²", "~5.7 million", "~$75,000 USD",
                "100% urban", "Finance, Technology, Trade", "Singapore")
)

kable(comparison_df, 
      caption = "Key Comparisons between Tanzania and Singapore") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(2:3, width = "20em")
Key Comparisons between Tanzania and Singapore
Aspect Tanzania Singapore
Land Area 945,087 km² 728 km²
Population ~61 million ~5.7 million
GDP per capita ~$1,300 USD ~$75,000 USD
Urbanisation 35.2% urban 100% urban
Main Economic Sectors Agriculture, Mining, Tourism Finance, Technology, Trade
Capital City Dodoma (official), Dar es Salaam (de facto) Singapore
rm(comparison_df) # Keep environment clean

Geographical Diversity

Unlike Singapore’s uniform urban landscape, Tanzania presents complex geographical features:

1. Physical Features:

  • 800km Indian Ocean coastline

  • Great Rift Valley running through central regions

  • Mount Kilimanjaro (Africa’s highest peak)

  • Major lakes (Victoria, Tanganyika)

2. Economic Zones:

  • Northern Circuit (Tourism)

  • Southern Highlands (Agriculture)

  • Lake Zone (Fishing, Mining)

  • Coastal Zone (Trade, Services)

Development Challenges

As a student from Singapore, I notice several contrasts in development challenges:

1. Infrastructure:

  • Transportation networks concentrated in certain regions

  • Rural-urban connectivity issues

  • Varying quality of telecommunications coverage

2. Economic:

  • Large rural population (64.8%)

  • Regional economic disparities

  • Heavy reliance on agriculture

  • Informal sector significance

3. Financial Services:

  • 45 licensed banks (mostly in urban areas)

  • 32.3 million mobile money accounts

  • 65% financial inclusion rate

  • Over 100 microfinance institutions

Administrative Structure

Tanzania’s governance structure affects service delivery:

- 31 regions

- 184 districts

- Two capital cities:

* Dodoma (Official capital, centrally located)

* Dar es Salaam (Economic hub, coastal location)

In my subsequent analysis, I’ll refer back to these contextual factors to ensure my interpretation of financial inclusion patterns is grounded in Tanzania’s unique circumstances rather than Singapore’s standards.

Load the packages and examine the shapefile

# Read the shapefile
tz_boundaries <- st_read(dsn = "data/geospatial/", 
                        layer = "geoBoundaries-TZA-ADM2")
Reading layer `geoBoundaries-TZA-ADM2' from data source 
  `C:\zzzzzuu\ISSS626GAA\Take-home_Ex\Take-home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 170 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 29.58953 ymin: -11.76235 xmax: 40.44473 ymax: -0.983143
Geodetic CRS:  WGS 84
# Correct misspelled Butiama which was discovered below
tz_boundaries <- tz_boundaries %>%
  mutate(shapeName = case_when(
    shapeName == "Butiam" ~ "Butiama",
    TRUE ~ shapeName  # keeps all other names unchanged
  ))

# Check the CRS (Coordinate Reference System)
st_crs(tz_boundaries)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
# Take a quick look at the data
glimpse(tz_boundaries)
Rows: 170
Columns: 6
$ shapeName  <chr> "Arusha", "Arusha Urban", "Karatu", "Longido", "Meru", "Mon…
$ shapeISO   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ shapeID    <chr> "72390352B32479700182608", "72390352B90906351205470", "7239…
$ shapeGroup <chr> "TZA", "TZA", "TZA", "TZA", "TZA", "TZA", "TZA", "TZA", "TZ…
$ shapeType  <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((36.86084 -3..., MULTIPOLYGON (…
tmap_mode("plot")
tmap mode set to plotting
# Create a quick plot to verify the import
tm_shape(tz_boundaries) +
  tm_borders()

sort(unique(tz_boundaries$shapeName))
  [1] "Arusha"                       "Arusha Urban"                
  [3] "Babati"                       "Babati UrbanBabati Urban"    
  [5] "Bagamoyo"                     "Bahi"                        
  [7] "Bariadi"                      "Biharamulo"                  
  [9] "Buhigwe"                      "Bukoba"                      
 [11] "Bukoba Urban"                 "Bukombe"                     
 [13] "Bunda"                        "Busega"                      
 [15] "Butiama"                      "Chake Chake"                 
 [17] "Chamwino"                     "Chato"                       
 [19] "Chemba"                       "Chunya"                      
 [21] "Dodoma Urban"                 "Gairo"                       
 [23] "Geita"                        "Hai"                         
 [25] "Hanang"                       "Handeni"                     
 [27] "Handeni Mji"                  "Igunga"                      
 [29] "Ikungi"                       "Ilala"                       
 [31] "Ileje"                        "Ilemela"                     
 [33] "Iramba"                       "Iringa"                      
 [35] "Iringa Urban"                 "Itilima"                     
 [37] "Kahama"                       "Kahama Township Authority"   
 [39] "Kakonko"                      "Kalambo"                     
 [41] "Kaliua"                       "Karagwe"                     
 [43] "Karatu"                       "Kaskazini A"                 
 [45] "Kaskazini B"                  "Kasulu"                      
 [47] "Kasulu Township Authority"    "Kati"                        
 [49] "Kibaha"                       "Kibaha Urban"                
 [51] "Kibondo"                      "Kigoma"                      
 [53] "Kigoma  Urban"                "Kilindi"                     
 [55] "Kilolo"                       "Kilombero"                   
 [57] "Kilosa"                       "Kilwa"                       
 [59] "Kinondoni"                    "Kisarawe"                    
 [61] "Kishapu"                      "Kiteto"                      
 [63] "Kondoa"                       "Kongwa"                      
 [65] "Korogwe"                      "Korogwe Township Authority"  
 [67] "Kusini"                       "Kwimba"                      
 [69] "Kyela"                        "Kyerwa"                      
 [71] "Lindi"                        "Lindi Urban"                 
 [73] "Liwale"                       "Longido"                     
 [75] "Ludewa"                       "Lushoto"                     
 [77] "Mafia"                        "Mafinga Township Authority"  
 [79] "Magharibi"                    "Magu"                        
 [81] "Makambako Township Authority" "Makete"                      
 [83] "Manyoni"                      "Masasi"                      
 [85] "Masasi  Township Authority"   "Maswa"                       
 [87] "Mbarali"                      "Mbeya"                       
 [89] "Mbeya Urban"                  "Mbinga"                      
 [91] "Mbogwe"                       "Mbozi"                       
 [93] "Mbulu"                        "Meatu"                       
 [95] "Meru"                         "Micheweni"                   
 [97] "Missenyi"                     "Misungwi"                    
 [99] "Mjini"                        "Mkalama"                     
[101] "Mkinga"                       "Mkoani"                      
[103] "Mkuranga"                     "Mlele"                       
[105] "Momba"                        "Monduli"                     
[107] "Morogoro"                     "Morogoro Urban"              
[109] "Moshi"                        "Moshi Urban"                 
[111] "Mpanda"                       "Mpanda Urban"                
[113] "Mpwapwa"                      "Mtwara"                      
[115] "Mtwara Urban"                 "Mufindi"                     
[117] "Muheza"                       "Muleba"                      
[119] "Musoma"                       "Musoma Urban"                
[121] "Mvomero"                      "Mwanga"                      
[123] "Nachingwea"                   "Namtumbo"                    
[125] "Nanyumbu"                     "Newala"                      
[127] "Ngara"                        "Ngorongoro"                  
[129] "Njombe"                       "Njombe Urban"                
[131] "Nkasi"                        "Nyamagana"                   
[133] "Nyang'hwale"                  "Nyasa"                       
[135] "Nzega"                        "Pangani"                     
[137] "Rombo"                        "Rorya"                       
[139] "Ruangwa"                      "Rufiji"                      
[141] "Rungwe"                       "Same"                        
[143] "Sengerema"                    "Serengeti"                   
[145] "Shinyanga"                    "Shinyanga Urban"             
[147] "Siha"                         "Sikonge"                     
[149] "Simanjiro"                    "Singida"                     
[151] "Singida Urban"                "Songea"                      
[153] "Songea Urban"                 "Songwe"                      
[155] "Sumbawanga"                   "Sumbawanga Urban"            
[157] "Tabora Urban"                 "Tandahimba"                  
[159] "Tanga Urban"                  "Tarime"                      
[161] "Temeke"                       "Tunduma"                     
[163] "Tunduru"                      "Ukerewe"                     
[165] "Ulanga"                       "Urambo"                      
[167] "Uvinza"                       "Uyui"                        
[169] "Wanging'ombe"                 "Wete"                        

The map appears accurate; however, there are urban splits which are not useful.

# First, create a function to clean urban district names
clean_urban_names <- function(name) {
  # Convert to lowercase for easier matching
  name <- tolower(name)
  
  # Remove common urban suffixes and clean up
  name <- gsub(" urban$| township authority$| mji$", "", name)
  
  # Clean up specific cases
  name <- gsub("babati urbanbabati", "babati", name)
  name <- gsub("korogwe township authority", "korogwe", name)
  name <- gsub("kigoma  urban", "kigoma", name)
  name <- gsub("masasi  township authority", "masasi", name)
  
  # Capitalize first letter of each word
  name <- tools::toTitleCase(name)
  name <- trimws(gsub("\\s+", " ", name))
  return(name)
}

# Apply the transformation and merge polygons
# Assuming your spatial data frame is called 'spatialdata'
tz_boundaries_merged <- tz_boundaries %>%
  # Clean the district names
  mutate(shapeName = clean_urban_names(shapeName)) %>%
  # Group by the cleaned district names
  group_by(shapeName) %>%
  # Merge the geometries
  summarise(
    geometry = st_union(geometry),
    .groups = "drop"
  ) %>%
  # Fix any invalid geometries after merging
  st_make_valid() %>%
  # Apply a zero buffer to fix any remaining issues
  st_buffer(0) %>%
  # Final validation check
  st_make_valid()

rm(clean_urban_names)

the presence of numerous islands may influence the positioning of the centroid. This tells me to give careful consideration to ensure precise placement.

# Clean and transform the spatial data
tz_districts <- tz_boundaries_merged %>%
  # Keep only necessary columns
  select(district_name = shapeName,
         geometry) %>%
  # Convert to more appropriate projection for Tanzania
  st_transform(crs = 32737) %>%  # UTM Zone 37S
  # Arrange alphabetically by district name
  arrange(district_name)

# Create a more detailed map to verify the data
tm_shape(tz_districts) +
  tm_polygons(col = "whitesmoke",
             border.col = "gray30",
             border.alpha = 0.5) +
  tm_layout(main.title = "Tanzania Districts",
            main.title.size = 1,
            frame = FALSE) +
  tm_compass(position = c("right", "top")) +
  tm_scale_bar(position = c("left", "bottom"))

#Remove smaller islands in the multipolygons to improve centroid placement
tz_districts_polygon <- tz_districts %>%
  st_cast("POLYGON") %>%
  mutate(area = st_area(.))
Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
sub-geometries for which they may not be constant
tz_districts_polygon_main <- tz_districts_polygon %>%
  group_by(district_name) %>%
  filter(area ==max(area)) %>%
  ungroup() %>%
  dplyr::select(-area) %>%
  dplyr::select(district_name)

# Calculate centroids
tz_centroids_main <- st_centroid(tz_districts_polygon_main)
Warning: st_centroid assumes attributes are constant over geometries
tz_centroids <- st_centroid(tz_districts)
Warning: st_centroid assumes attributes are constant over geometries
# Create the map for tz_districts_polygon_main with centroids overlay
tmap_mode("view")
tmap mode set to interactive viewing
map1 <- tm_shape(tz_districts_polygon_main) +
  tm_polygons() +
  tm_shape(tz_centroids_main) +
  tm_dots(size = 0.1, col = "red") +
  tm_layout(title = "TZ Districts Polygon Main with Centroids")

# Create the map for tz_districts with centroids overlay
map2 <- tm_shape(tz_districts) +
  tm_borders() +
  tm_shape(tz_centroids) +
  tm_dots(size = 0.1, col = "blue") +
  tm_layout(title = "TZ Districts with Centroids")

# Arrange the maps side by side
tmap_arrange(map1, map2)
rm(tz_boundaries,
   tz_boundaries_merged,
   tz_districts,
   tz_centroids,
   tz_districts_polygon
   ) # Keep environment clean

Some important notes:

  1. I’ve transformed the CRS to UTM Zone 37S (EPSG:32737) which is more appropriate for Tanzania as it:
-   Preserves area measurements

-   Provides more accurate distance calculations

-   Is suitable for mapping at district level
  1. The centroids for Uvinza is off-centre. I will manually shift the centroid.
latitude_shift <- 40000   # This was manually tested and shifted
longitude_shift <- -35000

# Get coordinates of Uvinza's centroid
uvinza_coords <- st_coordinates(tz_centroids_main$geometry[tz_centroids_main$district_name == "Uvinza"])

# Create new point with shifted coordinates
new_uvinza_point <- st_point(c(
  uvinza_coords[1] + longitude_shift,
  uvinza_coords[2] + latitude_shift
))

# Create new geometry collection
new_geometries <- tz_centroids_main$geometry
new_geometries[tz_centroids_main$district_name == "Uvinza"] <- st_sfc(new_uvinza_point, crs = st_crs(tz_centroids_main))

# Create new centroids object with shifted Uvinza point
tz_centroids_shifted <- tz_centroids_main
tz_centroids_shifted$geometry <- new_geometries

# Verify the shift
tmap_mode("view")      # Change from "view" to "plot" to save on computing
tmap mode set to interactive viewing
tm_shape(tz_districts_polygon_main) +
  tm_polygons() +
  tm_shape(tz_centroids_shifted) +
  tm_dots(size = 0.1, col = "red") +
  tm_layout(title = "TZ Districts with Shifted Uvinza Centroid")
latitude_shift <- 20000   # This was manually tested and shifted
longitude_shift <- -25000

# Get coordinates of Kilombero's centroid
kilombero_coords <- st_coordinates(tz_centroids_shifted$geometry[tz_centroids_shifted$district_name == "Kilombero"])

# Create new point with shifted coordinates
new_kilombero_point <- st_point(c(
  kilombero_coords[1] + longitude_shift,
  kilombero_coords[2] + latitude_shift
))

# Create new geometry collection
new_geometries2 <- tz_centroids_shifted$geometry
new_geometries2[tz_centroids_shifted$district_name == "Kilombero"] <- st_sfc(new_kilombero_point, crs = st_crs(tz_centroids_shifted))

# Create new centroids object with shifted kilombero point
tz_centroids_shifted2 <- tz_centroids_shifted
tz_centroids_shifted2$geometry <- new_geometries2

# Verify the shift
tmap_mode("view")      # Change from "view" to "plot" to save on computing
tmap mode set to interactive viewing
tm_shape(tz_districts_polygon_main) +
  tm_polygons() +
  tm_shape(tz_centroids_shifted2) +
  tm_dots(size = 0.1, col = "red") +
  tm_layout(title = "TZ Districts with Shifted Kilombero Centroid")
rm(tz_centroids_shifted)

The shift in the centroid for Uvinza and Kilombero is now correct based on human evaluation. Changing “view” to “plot” to save on computing power.

rm(latitude_shift,
   longitude_shift,
   uvinza_coords,
   kilombero_coords,
   new_uvinza_point,
   new_kilombero_point,
   new_geometries,
   new_geometries2,
   tz_centroids_main
   ) # Keep environment clean

Load the Financial Inclusion Survey results

findata <- read_csv("data/aspatial/FinScope Tanzania 2023_Individual Main Data_FINAL.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 9915 Columns: 721
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (703): reg_name, dist_name, ward_code1, ward_name, ea_code, clustertype,...
dbl  (13): SN, reg_code, dist_code, c8c, D6_1_1, D6_1_2, D6_1_3, gov_3, cmg4...
lgl   (5): e_5_1, e_5_2, g_5_2__5, g_5_2__13, serv2_4

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
findata <- findata %>%
  mutate(dist_name = case_when(
    dist_name == "Kigamboni" ~ "Temeke",
    dist_name == "Arumeru" ~ "Meru",
    dist_name == "Ubungo" ~ "Kinondoni",
    dist_name == "Kibiti" ~ "Rufiji",
    dist_name == "Malinyi" ~ "Ulanga",# This change was made due to examination of district names for matching below.
    TRUE ~ dist_name  # keeps all other names unchanged
  ))
# Check for district name misspellings

# Function to find best match for district names
find_best_match <- function(survey_name, shapefile_names) {
  # Calculate string distances using various methods
  distances <- stringdist(tolower(survey_name), 
                         tolower(shapefile_names),
                         method = "jw")  # Jaro-Winkler distance
  
  # Find the best match (smallest distance)
  best_match <- shapefile_names[which.min(distances)]
  
  # Calculate the similarity score (1 - normalized distance)
  best_score <- 1 - min(distances)
  
  # Only return match if similarity is high enough
  if (best_score >= 0.83) {
    return(best_match)
  } else {
    return(NA_character_)
  }
}

# Create matching dictionary
create_district_dictionary <- function(survey_districts, shapefile_districts) {
  # Create mapping dataframe
  mapping <- data.frame(
    survey_name = unique(survey_districts),
    stringsAsFactors = FALSE
  ) %>%
    mutate(
      # Find best match for each survey district name
      shapefile_name = sapply(survey_name, 
                             find_best_match, 
                             shapefile_districts),
      # Calculate similarity score
      match_score = sapply(survey_name, function(x) {
        distances <- stringdist(tolower(x), 
                              tolower(shapefile_districts),
                              method = "jw")
        1 - min(distances)
      })
    )
  
  # Sort by match score to easily review matches
  mapping <- mapping %>%
    arrange(desc(match_score))
  
  return(mapping)
}



# Apply the matching
mapping <- create_district_dictionary(
  unique(findata$dist_name),
  unique(tz_districts_polygon_main$district_name)  # adjust column name as needed
)

# Review the matches
# Print matches with low confidence for manual review
low_confidence <- mapping %>%
  filter(match_score < 0.83) %>%
  arrange(match_score)

print("Low confidence matches that might need manual review:")
[1] "Low confidence matches that might need manual review:"
print(low_confidence)
[1] survey_name    shapefile_name match_score   
<0 rows> (or 0-length row.names)
# Create a function to apply the mapping
standardize_district_names <- function(data, mapping) {
  data %>%
    left_join(mapping %>% 
                select(survey_name, shapefile_name),
              by = c("dist_name" = "survey_name")) %>%
    mutate(dist_name_std = coalesce(shapefile_name, dist_name)) %>%
    select(-shapefile_name)
}

# Apply standardization to your survey data
findata_standardized <- standardize_district_names(findata, mapping)

# Verify the matching worked
verification <- findata_standardized %>%
  group_by(dist_name, dist_name_std) %>%
  summarise(count = n(), .groups = "drop") %>%
  arrange(dist_name)

# Print summary of changes
cat("\nNumber of districts matched:", sum(!is.na(mapping$shapefile_name)))

Number of districts matched: 144
cat("\nNumber of districts unmatched:", sum(is.na(mapping$shapefile_name)))

Number of districts unmatched: 0
# Display some example matches
cat("\nExample matches (original -> standardized):\n")

Example matches (original -> standardized):
head(mapping %>% filter(!is.na(shapefile_name)), 10) %>%
  mutate(mapping = paste(survey_name, "->", shapefile_name)) %>%
  pull(mapping) %>%
  cat(sep = "\n")
Misungwi -> Misungwi
Missenyi -> Missenyi
Kyela -> Kyela
Kongwa -> Kongwa
Ilala -> Ilala
Iramba -> Iramba
Mbogwe -> Mbogwe
Handeni -> Handeni
Chato -> Chato
Sengerema -> Sengerema
rm(low_confidence,
   mapping,
   verification,
   create_district_dictionary,
   find_best_match,
   standardize_district_names
   ) # Keep environment clean

Pick the columns to use for regression

# Select specified columns from the findata dataset
findata_selected <- findata_standardized %>%
 select(
   # Location and cluster information
   dist_name,          # District name
   clustertype,        # Cluster type
   
   # Demographic variables
   c8c,                # Age
   c9,                 # Gender
   c11,                # Education status
   c14,                # Agricultural activity involvement
   
   # Weights
   Household_weight,   # Household level weight
   population_wt,      # Population weight
   
   # Derived financial inclusion indicators
   MM,                 # Mobile money usage
   BANKED,             # Banking services usage
   MFI,                # Microfinance institution usage
   PENSION,            # Pension services usage
   INSURANCE,          # Insurance services usage
   SACCO,              # SACCO membership/usage
   CAPITALM_FUND_MANAGERS,  # Capital market/fund manager usage
   FORM_INVESTMENTS,        # Formal investments
   CMG,                     # Community microfinance group membership
   INFORMAL_MONEYLENDER,    # Informal moneylender usage
   SOCIAL_GROUPS            # Social group membership
 ) %>%
 # Clean column names for consistency
 clean_names()

# Check the structure of selected data
glimpse(findata_selected)
Rows: 9,915
Columns: 19
$ dist_name              <chr> "Misungwi", "Missenyi", "Kyela", "Kongwa", "Ila…
$ clustertype            <chr> "Rural", "Rural", "Urban", "Urban", "Urban", "U…
$ c8c                    <dbl> 47, 63, 74, 29, 53, 39, 24, 55, 45, 56, 51, 36,…
$ c9                     <chr> "Female", "Female", "Male", "Female", "Male", "…
$ c11                    <chr> "Some primary", "No formal education", "Some pr…
$ c14                    <chr> "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", …
$ household_weight       <dbl> 1381.5372, 2986.4383, 1434.8197, 2352.7250, 180…
$ population_wt          <dbl> 3191.1104, 3675.4824, 2043.7091, 4003.1678, 261…
$ mm                     <chr> "MM", "Not MM", "MM", "MM", "MM", "MM", "MM", "…
$ banked                 <chr> "Not Banked", "Not Banked", "Not Banked", "Not …
$ mfi                    <chr> "Not MFI", "Not MFI", "Not MFI", "Not MFI", "No…
$ pension                <chr> "Not PENSION", "Not PENSION", "Not PENSION", "N…
$ insurance              <chr> "0", "0", "INSURANCE", "0", "0", "0", "0", "0",…
$ sacco                  <chr> "Not SACCO", "Not SACCO", "Not SACCO", "Not SAC…
$ capitalm_fund_managers <chr> "Not CAPITALM_FUND_MANAGERS", "Not CAPITALM_FUN…
$ form_investments       <chr> "Not FORM_INVESTMENTS", "Not FORM_INVESTMENTS",…
$ cmg                    <chr> "CMG", "CMG", "CMG", "CMG", "Not CMG", "Not CMG…
$ informal_moneylender   <chr> "Not INFORMAL_MONEYLENDER", "Not INFORMAL_MONEY…
$ social_groups          <chr> "Not SOCIAL_GROUPS", "Not SOCIAL_GROUPS", "Not …
summary(findata_selected)
  dist_name         clustertype             c8c              c9           
 Length:9915        Length:9915        Min.   : 16.00   Length:9915       
 Class :character   Class :character   1st Qu.: 27.00   Class :character  
 Mode  :character   Mode  :character   Median : 37.00   Mode  :character  
                                       Mean   : 39.68                     
                                       3rd Qu.: 50.00                     
                                       Max.   :100.00                     
     c11                c14            household_weight   population_wt     
 Length:9915        Length:9915        Min.   :   50.71   Min.   :   73.48  
 Class :character   Class :character   1st Qu.:  515.84   1st Qu.: 1174.56  
 Mode  :character   Mode  :character   Median : 1040.76   Median : 2287.63  
                                       Mean   : 1427.35   Mean   : 3442.69  
                                       3rd Qu.: 1745.86   3rd Qu.: 4175.85  
                                       Max.   :11680.64   Max.   :50600.52  
      mm               banked              mfi              pension         
 Length:9915        Length:9915        Length:9915        Length:9915       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
  insurance            sacco           capitalm_fund_managers
 Length:9915        Length:9915        Length:9915           
 Class :character   Class :character   Class :character      
 Mode  :character   Mode  :character   Mode  :character      
                                                             
                                                             
                                                             
 form_investments       cmg            informal_moneylender social_groups     
 Length:9915        Length:9915        Length:9915          Length:9915       
 Class :character   Class :character   Class :character     Class :character  
 Mode  :character   Mode  :character   Mode  :character     Mode  :character  
                                                                              
                                                                              
                                                                              
# Get summary statistics of the selected variables
skim(findata_selected)
Data summary
Name findata_selected
Number of rows 9915
Number of columns 19
_______________________
Column type frequency:
character 16
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
dist_name 0 1 3 12 0 144 0
clustertype 0 1 5 5 0 2 0
c9 0 1 4 6 0 2 0
c11 0 1 10 41 0 10 0
c14 0 1 2 3 0 2 0
mm 0 1 2 6 0 2 0
banked 0 1 6 10 0 2 0
mfi 0 1 3 7 0 2 0
pension 0 1 7 11 0 2 0
insurance 0 1 1 9 0 2 0
sacco 0 1 5 9 0 2 0
capitalm_fund_managers 0 1 22 26 0 2 0
form_investments 0 1 16 20 0 2 0
cmg 0 1 3 7 0 2 0
informal_moneylender 0 1 20 24 0 2 0
social_groups 0 1 13 17 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
c8c 0 1 39.68 16.65 16.00 27.00 37.00 50.00 100.00 ▇▆▃▂▁
household_weight 0 1 1427.35 1480.22 50.71 515.84 1040.76 1745.86 11680.64 ▇▁▁▁▁
population_wt 0 1 3442.69 3977.97 73.48 1174.56 2287.63 4175.85 50600.52 ▇▁▁▁▁
rm(findata,
   findata_standardized
   ) # Keep environment clean

Exploratory Data Analysis

# Summary
summary(findata_selected)
  dist_name         clustertype             c8c              c9           
 Length:9915        Length:9915        Min.   : 16.00   Length:9915       
 Class :character   Class :character   1st Qu.: 27.00   Class :character  
 Mode  :character   Mode  :character   Median : 37.00   Mode  :character  
                                       Mean   : 39.68                     
                                       3rd Qu.: 50.00                     
                                       Max.   :100.00                     
     c11                c14            household_weight   population_wt     
 Length:9915        Length:9915        Min.   :   50.71   Min.   :   73.48  
 Class :character   Class :character   1st Qu.:  515.84   1st Qu.: 1174.56  
 Mode  :character   Mode  :character   Median : 1040.76   Median : 2287.63  
                                       Mean   : 1427.35   Mean   : 3442.69  
                                       3rd Qu.: 1745.86   3rd Qu.: 4175.85  
                                       Max.   :11680.64   Max.   :50600.52  
      mm               banked              mfi              pension         
 Length:9915        Length:9915        Length:9915        Length:9915       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
  insurance            sacco           capitalm_fund_managers
 Length:9915        Length:9915        Length:9915           
 Class :character   Class :character   Class :character      
 Mode  :character   Mode  :character   Mode  :character      
                                                             
                                                             
                                                             
 form_investments       cmg            informal_moneylender social_groups     
 Length:9915        Length:9915        Length:9915          Length:9915       
 Class :character   Class :character   Class :character     Class :character  
 Mode  :character   Mode  :character   Mode  :character     Mode  :character  
                                                                              
                                                                              
                                                                              
# Create a series of visualizations
lapply(c("c9", "c11", "clustertype", "c14", 
         "mm", "banked", "mfi", "pension", 
         "insurance", "sacco","capitalm_fund_managers",
         "form_investments","cmg",
         "informal_moneylender","social_groups"), function(var) {
  ggplot(findata_selected, aes_string(x = var)) +
    geom_bar() +
    theme_minimal() +
    labs(title = paste("Distribution of", var),
         x = var,
         y = "Count") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
})
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
[[1]]


[[2]]


[[3]]


[[4]]


[[5]]


[[6]]


[[7]]


[[8]]


[[9]]


[[10]]


[[11]]


[[12]]


[[13]]


[[14]]


[[15]]

Looking at the character charts, we need to aggregate them into a variable by district level. The only non-binary variable is education. I’ll need to give it a education score.

# Create education score mapping
education_scores <- c(
  "Don’t know" = 0,
  "No formal education" = 1,
  "Post primary technical training" = 4,  # After primary but before secondary
  "Primary completed" = 3,
  "Secondary competed-O level" = 5,
  "Secondary completed-A level" = 6,
  "Some primary" = 2,
  "Some secondary" = 4,
  "Some University or other higher education" = 6,
  "University or other higher education" = 7,
  "University or higher education completed" = 8
)

# Apply scores to your data
findata_selected_ed <- findata_selected %>%
  mutate(education_score = education_scores[c11])

rm(education_scores, findata_selected)

Group and aggregate the survey data to form district-level data

district_summary <- findata_selected_ed %>%
 group_by(dist_name) %>%
 summarise(
   # Urbanity - count and percentage of urban areas
   urban_count = sum(clustertype == "Urban", na.rm = TRUE),
   urban_pct = mean(clustertype == "Urban", na.rm = TRUE) * 100,
   
   # Demographics
   median_age = median(as.numeric(c8c), na.rm = TRUE),
   average_ed = mean(as.numeric(education_score), na.rm = TRUE),
   male_count = sum(c9 == "Male", na.rm = TRUE),
   male_pct = mean(c9 == "Male", na.rm = TRUE) * 100,
   agriculture_count = sum(c14 == "Yes", na.rm = TRUE),
   agriculture_pct = mean(c14 == "Yes", na.rm = TRUE) * 100,
   
   # Financial services counts and percentages
   mobile_money_count = sum(mm == "MM", na.rm = TRUE),
   mobile_money_pct = mean(mm == "MM", na.rm = TRUE) * 100,
   
   bank_count = sum(banked == "Banked", na.rm = TRUE),
   bank_pct = mean(banked == "Banked", na.rm = TRUE) * 100,
   
   mfi_count = sum(mfi == "MFI", na.rm = TRUE),
   mfi_pct = mean(mfi == "MFI", na.rm = TRUE) * 100,
   
   pension_count = sum(pension == "PENSION", na.rm = TRUE),
   pension_pct = mean(pension == "PENSION", na.rm = TRUE) * 100,
   
   insurance_count = sum(insurance == "INSURANCE", na.rm = TRUE),
   insurance_pct = mean(insurance == "INSURANCE", na.rm = TRUE) * 100,
   
   sacco_count = sum(sacco == "SACCO", na.rm = TRUE),
   sacco_pct = mean(sacco == "SACCO", na.rm = TRUE) * 100,
   
   capital_count = sum(capitalm_fund_managers == "CAPITALM_FUND_MANAGERS", na.rm = TRUE),
   capital_pct = mean(capitalm_fund_managers == "CAPITALM_FUND_MANAGERS", na.rm = TRUE) * 100,
   
   invest_count = sum(form_investments == "FORM_INVESTMENTS", na.rm = TRUE),
   invest_pct = mean(form_investments == "FORM_INVESTMENTS", na.rm = TRUE) * 100,

   cmg_count = sum(cmg == "CMG", na.rm = TRUE),
   cmg_pct = mean(cmg == "CMG", na.rm = TRUE) * 100,  
   
   moneylender_count = sum(informal_moneylender == "INFORMAL_MONEYLENDER", na.rm = TRUE),
   moneylender_pct = mean(informal_moneylender == "INFORMAL_MONEYLENDER", na.rm = TRUE) * 100,
   
   social_count = sum(social_groups == "SOCIAL_GROUPS", na.rm = TRUE),
   social_pct = mean(social_groups == "SOCIAL_GROUPS", na.rm = TRUE) * 100,  
   
   
   # Total responses per district
   total_respondents = n()
 ) %>%
 # Round all percentage columns to 2 decimal places
 mutate(across(ends_with("_pct"), ~round(., 2)))

# View the first few rows
head(district_summary)
# A tibble: 6 × 32
  dist_name  urban_count urban_pct median_age average_ed male_count male_pct
  <chr>            <int>     <dbl>      <dbl>      <dbl>      <int>    <dbl>
1 Arusha              75     100           35       4.07         29     38.7
2 Babati              15      14.3         34       3.23         43     41.0
3 Bagamoyo            29      39.7         38       3.41         34     46.6
4 Bahi                 0       0           34       1.8          17     37.8
5 Bariadi             30      40           33       2.83         30     40  
6 Biharamulo           0       0           30       2.49         22     48.9
# ℹ 25 more variables: agriculture_count <int>, agriculture_pct <dbl>,
#   mobile_money_count <int>, mobile_money_pct <dbl>, bank_count <int>,
#   bank_pct <dbl>, mfi_count <int>, mfi_pct <dbl>, pension_count <int>,
#   pension_pct <dbl>, insurance_count <int>, insurance_pct <dbl>,
#   sacco_count <int>, sacco_pct <dbl>, capital_count <int>, capital_pct <dbl>,
#   invest_count <int>, invest_pct <dbl>, cmg_count <int>, cmg_pct <dbl>,
#   moneylender_count <int>, moneylender_pct <dbl>, social_count <int>, …
# Check for any districts with suspicious values
summary(district_summary)
  dist_name          urban_count       urban_pct        median_age   
 Length:144         Min.   :  0.00   Min.   :  0.00   Min.   :29.00  
 Class :character   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:34.00  
 Mode  :character   Median : 15.00   Median : 23.07   Median :37.00  
                    Mean   : 23.03   Mean   : 25.75   Mean   :37.19  
                    3rd Qu.: 30.00   3rd Qu.: 34.88   3rd Qu.:40.00  
                    Max.   :181.00   Max.   :100.00   Max.   :50.50  
   average_ed      male_count       male_pct     agriculture_count
 Min.   :1.600   Min.   : 4.00   Min.   :26.67   Min.   :  8.00   
 1st Qu.:2.578   1st Qu.:19.00   1st Qu.:39.91   1st Qu.: 29.00   
 Median :2.900   Median :29.00   Median :43.91   Median : 44.00   
 Mean   :2.967   Mean   :30.47   Mean   :44.27   Mean   : 48.92   
 3rd Qu.:3.278   3rd Qu.:40.00   3rd Qu.:48.89   3rd Qu.: 66.00   
 Max.   :4.446   Max.   :83.00   Max.   :61.33   Max.   :122.00   
 agriculture_pct  mobile_money_count mobile_money_pct   bank_count   
 Min.   :  4.42   Min.   : 10.00     Min.   :30.00    Min.   : 0.00  
 1st Qu.: 67.67   1st Qu.: 24.75     1st Qu.:60.00    1st Qu.: 5.00  
 Median : 83.89   Median : 42.00     Median :70.33    Median : 9.00  
 Mean   : 76.27   Mean   : 48.93     Mean   :68.61    Mean   :14.13  
 3rd Qu.: 92.03   3rd Qu.: 66.00     3rd Qu.:80.00    3rd Qu.:19.25  
 Max.   :100.00   Max.   :167.00     Max.   :95.17    Max.   :87.00  
    bank_pct       mfi_count         mfi_pct       pension_count   
 Min.   : 0.00   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.:10.83   1st Qu.: 1.000   1st Qu.: 2.310   1st Qu.: 0.000  
 Median :17.78   Median : 3.000   Median : 4.785   Median : 1.000  
 Mean   :18.51   Mean   : 4.708   Mean   : 6.068   Mean   : 2.778  
 3rd Qu.:24.14   3rd Qu.: 6.000   3rd Qu.: 8.367   3rd Qu.: 4.000  
 Max.   :48.07   Max.   :34.000   Max.   :22.670   Max.   :18.000  
  pension_pct     insurance_count insurance_pct    sacco_count    
 Min.   : 0.000   Min.   : 0.00   Min.   : 0.00   Min.   :0.0000  
 1st Qu.: 0.000   1st Qu.: 2.00   1st Qu.: 4.44   1st Qu.:0.0000  
 Median : 2.245   Median : 5.00   Median : 8.89   Median :0.0000  
 Mean   : 3.548   Mean   : 7.09   Mean   : 9.42   Mean   :0.8819  
 3rd Qu.: 6.670   3rd Qu.: 9.00   3rd Qu.:13.33   3rd Qu.:1.0000  
 Max.   :17.330   Max.   :47.00   Max.   :30.99   Max.   :9.0000  
   sacco_pct      capital_count     capital_pct      invest_count   
 Min.   : 0.000   Min.   :0.0000   Min.   :0.0000   Min.   : 0.000  
 1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 0.750  
 Median : 0.000   Median :0.0000   Median :0.0000   Median : 1.000  
 Mean   : 1.222   Mean   :0.2708   Mean   :0.3232   Mean   : 2.924  
 3rd Qu.: 1.840   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.: 4.000  
 Max.   :13.330   Max.   :9.0000   Max.   :6.6700   Max.   :20.000  
   invest_pct        cmg_count         cmg_pct      moneylender_count
 Min.   : 0.0000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000   
 1st Qu.: 0.6225   1st Qu.: 3.000   1st Qu.: 6.67   1st Qu.: 1.000   
 Median : 2.6700   Median : 7.000   Median :10.00   Median : 2.000   
 Mean   : 3.7360   Mean   : 8.257   Mean   :12.47   Mean   : 2.944   
 3rd Qu.: 6.6700   3rd Qu.:12.000   3rd Qu.:17.33   3rd Qu.: 4.000   
 Max.   :17.3300   Max.   :32.000   Max.   :40.91   Max.   :20.000   
 moneylender_pct   social_count      social_pct     total_respondents
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 15.00   
 1st Qu.: 1.330   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 45.00   
 Median : 2.870   Median : 1.000   Median : 2.070   Median : 60.00   
 Mean   : 4.401   Mean   : 2.076   Mean   : 3.136   Mean   : 68.85   
 3rd Qu.: 6.670   3rd Qu.: 3.000   3rd Qu.: 4.482   3rd Qu.: 90.00   
 Max.   :20.000   Max.   :13.000   Max.   :22.220   Max.   :181.00   

Join aspatial data with geospatial data

# Join the spatial data
district_summary_spatial <- tz_districts_polygon_main %>%
 left_join(district_summary, 
           by = c("district_name" = "dist_name")) %>%
 st_as_sf()  # ensure it remains as spatial object

# Join centroids
district_summary_spatial <- district_summary_spatial %>%
 left_join(
   tz_centroids_shifted2 %>% 
     st_set_geometry(NULL) %>%  # remove geometry before joining
     select(district_name, everything()),
   by = c("district_name" = "district_name")
 )

# Check the join results
print(paste("Number of districts in summary:", nrow(district_summary)))
[1] "Number of districts in summary: 144"
print(paste("Number of districts after spatial join:", nrow(district_summary_spatial)))
[1] "Number of districts after spatial join: 147"
print(paste("Number of districts with centroids:", sum(!is.na(district_summary_spatial$geometry))))
[1] "Number of districts with centroids: 147"
# Check for any districts that didn't match
missing_districts <- district_summary_spatial %>%
 filter(is.na(total_respondents)) %>%
 pull(district_name)

if(length(missing_districts) > 0) {
 print("Districts without survey data:")
 print(missing_districts)
}
[1] "Districts without survey data:"
[1] "Kaskazini a" "Kaskazini b" "Korogwe"     "Mafia"       "Mafinga"    
[6] "Magharibi"   "Makambako"   "Tunduma"    
district_summary_spatial <- district_summary_spatial %>%
  drop_na(mobile_money_count)

EDA to examine data

summary(district_summary_spatial)
 district_name               geometry    urban_count       urban_pct     
 Length:139         POLYGON      :139   Min.   :  0.00   Min.   :  0.00  
 Class :character   epsg:32737   :  0   1st Qu.:  0.00   1st Qu.:  0.00  
 Mode  :character   +proj=utm ...:  0   Median : 15.00   Median : 22.41  
                                        Mean   : 22.57   Mean   : 25.40  
                                        3rd Qu.: 30.00   3rd Qu.: 34.48  
                                        Max.   :181.00   Max.   :100.00  
   median_age      average_ed      male_count       male_pct    
 Min.   :29.00   Min.   :1.600   Min.   : 4.00   Min.   :26.67  
 1st Qu.:34.00   1st Qu.:2.578   1st Qu.:19.00   1st Qu.:39.83  
 Median :37.00   Median :2.884   Median :28.00   Median :44.00  
 Mean   :37.26   Mean   :2.950   Mean   :30.15   Mean   :44.34  
 3rd Qu.:40.00   3rd Qu.:3.234   3rd Qu.:40.00   3rd Qu.:48.89  
 Max.   :50.50   Max.   :4.446   Max.   :83.00   Max.   :61.33  
 agriculture_count agriculture_pct  mobile_money_count mobile_money_pct
 Min.   :  8.00    Min.   :  4.42   Min.   : 10.0      Min.   :30.00   
 1st Qu.: 29.50    1st Qu.: 69.41   1st Qu.: 24.0      1st Qu.:60.00   
 Median : 44.00    Median : 84.44   Median : 42.0      Median :70.00   
 Mean   : 49.43    Mean   : 77.60   Mean   : 48.1      Mean   :68.28   
 3rd Qu.: 66.00    3rd Qu.: 92.64   3rd Qu.: 65.5      3rd Qu.:79.91   
 Max.   :122.00    Max.   :100.00   Max.   :167.0      Max.   :95.17   
   bank_count       bank_pct       mfi_count         mfi_pct      
 Min.   : 0.00   Min.   : 0.00   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 4.50   1st Qu.:10.00   1st Qu.: 1.000   1st Qu.: 2.235  
 Median : 9.00   Median :17.78   Median : 3.000   Median : 5.000  
 Mean   :13.91   Mean   :18.41   Mean   : 4.727   Mean   : 6.122  
 3rd Qu.:18.50   3rd Qu.:24.18   3rd Qu.: 6.000   3rd Qu.: 8.525  
 Max.   :87.00   Max.   :48.07   Max.   :34.000   Max.   :22.670  
 pension_count     pension_pct     insurance_count  insurance_pct   
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 2.000   1st Qu.: 4.625  
 Median : 1.000   Median : 2.220   Median : 5.000   Median : 8.890  
 Mean   : 2.669   Mean   : 3.451   Mean   : 7.094   Mean   : 9.490  
 3rd Qu.: 4.000   3rd Qu.: 6.300   3rd Qu.: 9.000   3rd Qu.:13.330  
 Max.   :18.000   Max.   :17.330   Max.   :47.000   Max.   :30.990  
  sacco_count       sacco_pct      capital_count     capital_pct    
 Min.   :0.0000   Min.   : 0.000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median : 0.000   Median :0.0000   Median :0.0000  
 Mean   :0.8417   Mean   : 1.194   Mean   :0.2806   Mean   :0.3348  
 3rd Qu.:1.0000   3rd Qu.: 1.745   3rd Qu.:0.0000   3rd Qu.:0.0000  
 Max.   :9.0000   Max.   :13.330   Max.   :9.0000   Max.   :6.6700  
  invest_count     invest_pct       cmg_count         cmg_pct     
 Min.   : 0.00   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
 1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.: 3.000   1st Qu.: 6.67  
 Median : 1.00   Median : 2.500   Median : 7.000   Median :10.00  
 Mean   : 2.82   Mean   : 3.645   Mean   : 8.252   Mean   :12.60  
 3rd Qu.: 4.00   3rd Qu.: 6.670   3rd Qu.:12.000   3rd Qu.:17.33  
 Max.   :20.00   Max.   :17.330   Max.   :32.000   Max.   :40.91  
 moneylender_count moneylender_pct   social_count     social_pct    
 Min.   : 0        Min.   : 0.000   Min.   : 0.00   Min.   : 0.000  
 1st Qu.: 1        1st Qu.: 1.340   1st Qu.: 0.00   1st Qu.: 0.000  
 Median : 2        Median : 3.330   Median : 1.00   Median : 2.220  
 Mean   : 3        Mean   : 4.507   Mean   : 2.05   Mean   : 3.133  
 3rd Qu.: 4        3rd Qu.: 6.670   3rd Qu.: 3.00   3rd Qu.: 4.485  
 Max.   :20        Max.   :20.000   Max.   :13.00   Max.   :22.220  
 total_respondents
 Min.   : 15.00   
 1st Qu.: 45.00   
 Median : 60.00   
 Mean   : 67.99   
 3rd Qu.: 89.00   
 Max.   :181.00   
# Create histograms and boxplots for the main variables
plot_hist_box <- function(data, var, title) {
  # Create histogram
  p1 <- ggplot(data, aes(x = .data[[var]])) +
    geom_histogram(fill = "skyblue", color = "black", alpha = 0.7) +
    theme_minimal() +
    labs(title = paste("Histogram of", title),
         x = title,
         y = "Count")
  
  # Create boxplot
  p2 <- ggplot(data, aes(y = .data[[var]])) +
    geom_boxplot(fill = "skyblue", alpha = 0.7) +
    theme_minimal() +
    labs(title = paste("Boxplot of", title),
         y = title)
  
  # Arrange plots side by side
  gridExtra::grid.arrange(p1, p2, ncol = 2)
}

# Create plots for key variables
variables_to_plot <- list(
  c("urban_pct", "Urban Population %"),
  c("median_age", "Median Age"),
  c("average_ed", "Average Education Level"),
  c("male_pct", "Male Population %"),
  c("agriculture_pct", "Agricultural Employment %"),
  c("mobile_money_pct", "Mobile Money Usage %"),
  c("bank_pct", "Bank Account Usage %"),
  c("mfi_pct", "Microfinance Institution Usage %"),
  c("pension_pct", "Pension Scheme Usage %"),
  c("insurance_pct", "Insurance Service Usage %"),
  c("sacco_pct", "Savings & Credit Co-op Usage %"),
  c("cmg_pct", "Credit Management Group Usage %"),
  c("capital_pct", "Capital Market Fund Usage %"),
  c("invest_pct", "Formal Investment Usage %"),
  c("moneylender_pct", "Informal Moneylender Usage %")
)

# Generate all plots
for(var in variables_to_plot) {
  plot_hist_box(district_summary_spatial, var[1], var[2])
}
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Create correlation plot for financial inclusion indicators
financial_vars <- c("mobile_money_pct", "bank_pct", "mfi_pct", 
                   "pension_pct", "insurance_pct", "sacco_pct",
                   "cmg_pct","capital_pct","invest_pct","moneylender_pct")

correlation_data <- district_summary_spatial %>%
  st_drop_geometry() %>%
  select(all_of(financial_vars))

# Create correlation plot
corrplot::corrplot(cor(correlation_data, use = "complete.obs"),
                  method = "color",
                  type = "upper",
                  addCoef.col = "black",
                  tl.col = "black",
                  tl.srt = 45,
                  diag = FALSE)

This correlation heatmap provides insights into the relationships between mobile money usage percentage (mobile_money_pct) and various financial service usage metrics across Tanzania. Focusing on mobile_money_pct, I observed that it has the strongest positive correlations with bank_pct (0.61) and mfi_pct (0.60), suggesting that higher usage of banking and microfinance services is associated with greater adoption of mobile money services. This could imply that regions with more formal financial services may also be more inclined to adopt mobile money, possibly due to higher financial literacy or a stronger financial infrastructure.

Other variables, such as pension_pct and insurance_pct, show moderate positive correlations with mobile_money_pct (0.40 and 0.25, respectively). This indicates a lesser, yet still positive, association with mobile money usage. Meanwhile, sacco_pct, cmg_pct, and moneylender_pct show weak positive correlations with mobile money usage, while invest_pct has a very weak negative correlation (-0.09), suggesting minimal or mixed associations.

Another interesting observation is the near-perfect correlation between pension_pct and invest_pct (0.98), indicating that these two variables may be closely linked or overlapping in certain areas; I will drop invest_pct. Overall, this correlation matrix suggests that formal financial service usage, particularly bank and microfinance services, has the most substantial positive association with mobile money adoption across Tanzania. This information could be useful for targeting strategies aimed at increasing mobile money usage, especially in regions with already established banking or microfinance services.

Regression

# Build the adaptive bandwidth GWR model
bw.adaptive <- bw.gwr(formula = mobile_money_count ~
                        urban_pct +
                        median_age +
                        average_ed +
                        male_pct +
                        agriculture_pct +
                        bank_pct +
                        mfi_pct +
                        pension_pct +
                        sacco_pct +
                        insurance_pct +
                        cmg_pct +
                        moneylender_pct,
                      data = district_summary_spatial,
                      approach = "CV",
                      kernel = "gaussian",
                      adaptive = TRUE,
                      longlat = FALSE)
Adaptive bandwidth: 93 CV score: 75375.78 
Adaptive bandwidth: 65 CV score: 76966.72 
Adaptive bandwidth: 110 CV score: 74948.85 
Adaptive bandwidth: 121 CV score: 74802.25 
Adaptive bandwidth: 127 CV score: 74718.38 
Adaptive bandwidth: 132 CV score: 74643.56 
Adaptive bandwidth: 134 CV score: 74613.04 
Adaptive bandwidth: 136 CV score: 74603.76 
Adaptive bandwidth: 137 CV score: 74598.76 
Adaptive bandwidth: 138 CV score: 74591.25 
Adaptive bandwidth: 138 CV score: 74591.25 
# Fit the GWR model using the optimal bandwidth
gwr.model <- gwr.basic(formula = mobile_money_count ~
                        urban_pct +
                        median_age +
                        average_ed +
                        male_pct +
                        agriculture_pct +
                        bank_pct +
                        mfi_pct +
                        pension_pct +
                        sacco_pct +
                        insurance_pct +
                        cmg_pct +
                        invest_pct +
                        moneylender_pct,
                       data = district_summary_spatial,
                       bw = bw.adaptive,
                       kernel = "gaussian",
                       adaptive = TRUE,
                       longlat = FALSE)

# Print model diagnostics
gwr.model
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-11-08 17:04:20.300623 
   Call:
   gwr.basic(formula = mobile_money_count ~ urban_pct + median_age + 
    average_ed + male_pct + agriculture_pct + bank_pct + mfi_pct + 
    pension_pct + sacco_pct + insurance_pct + cmg_pct + invest_pct + 
    moneylender_pct, data = district_summary_spatial, bw = bw.adaptive, 
    kernel = "gaussian", adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  mobile_money_count
   Independent variables:  urban_pct median_age average_ed male_pct agriculture_pct bank_pct mfi_pct pension_pct sacco_pct insurance_pct cmg_pct invest_pct moneylender_pct
   Number of data points: 139
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
    Min      1Q  Median      3Q     Max 
-54.641 -14.540  -1.805  13.866  62.114 

   Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
   (Intercept)     -20.53939   33.89187  -0.606 0.545596    
   urban_pct         0.44051    0.11419   3.858 0.000182 ***
   median_age        0.13876    0.48621   0.285 0.775819    
   average_ed       16.99127    6.90880   2.459 0.015286 *  
   male_pct          0.07126    0.27156   0.262 0.793444    
   agriculture_pct  -0.03117    0.15891  -0.196 0.844793    
   bank_pct          0.22653    0.26398   0.858 0.392477    
   mfi_pct           0.25138    0.51393   0.489 0.625611    
   pension_pct      -1.35136    2.78282  -0.486 0.628096    
   sacco_pct        -1.96007    1.00632  -1.948 0.053686 .  
   insurance_pct     0.43655    0.39172   1.114 0.267222    
   cmg_pct          -0.31954    0.22752  -1.404 0.162662    
   invest_pct        1.15088    2.75064   0.418 0.676369    
   moneylender_pct  -0.35728    0.46184  -0.774 0.440628    

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 21.84 on 125 degrees of freedom
   Multiple R-squared: 0.5346
   Adjusted R-squared: 0.4862 
   F-statistic: 11.04 on 13 and 125 DF,  p-value: 2.077e-15 
   ***Extra Diagnostic information
   Residual sum of squares: 59639.79
   Sigma(hat): 20.86449
   AIC:  1267.028
   AICc:  1270.93
   BIC:  1246.062
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 138 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                          Min.     1st Qu.      Median     3rd Qu.     Max.
   Intercept       -32.4915092 -27.1804597 -22.4779079 -19.4541636 -12.8343
   urban_pct         0.4225319   0.4240663   0.4317376   0.4469388   0.4555
   median_age        0.0948055   0.1341729   0.1641688   0.2046965   0.2344
   average_ed       15.4865555  16.6111845  17.0191379  17.5403920  18.2938
   male_pct          0.0211997   0.0579996   0.0904163   0.1070567   0.1440
   agriculture_pct  -0.0516092  -0.0391837  -0.0255326  -0.0096544  -0.0022
   bank_pct          0.1997025   0.2144771   0.2462946   0.2610998   0.2696
   mfi_pct           0.2267741   0.2277642   0.2336139   0.2611214   0.2776
   pension_pct      -2.2040833  -2.0323281  -1.7827381  -1.3813795  -1.0678
   sacco_pct        -2.1128004  -2.0297480  -1.9866202  -1.8954683  -1.7415
   insurance_pct     0.4099617   0.4231665   0.4583165   0.4843928   0.4918
   cmg_pct          -0.3644414  -0.3445227  -0.3113028  -0.2863834  -0.2784
   invest_pct        0.9538858   1.2109123   1.5399441   1.7334652   1.8925
   moneylender_pct  -0.4629537  -0.4197574  -0.3463350  -0.2934797  -0.2468
   ************************Diagnostic information*************************
   Number of data points: 139 
   Effective number of parameters (2trace(S) - trace(S'S)): 17.64289 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 121.3571 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 1271.901 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 1248.973 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 1172.609 
   Residual sum of squares: 57958.69 
   R-square value:  0.5476804 
   Adjusted R-square value:  0.4813758 

   ***********************************************************************
   Program stops at: 2024-11-08 17:04:20.340749 
# Extract local R2 values
local_r2 <- gwr.model$SDF$Local_R2

# Extract coefficient estimates
coef_estimates <- as.data.frame(gwr.model$SDF)

# Create summary statistics for local coefficients
coef_summary <- data.frame(
  Variable = names(coef_estimates)[1:4],  # Includes intercept
  Mean = colMeans(coef_estimates[,1:4]),
  Min = apply(coef_estimates[,1:4], 2, min),
  Max = apply(coef_estimates[,1:4], 2, max),
  SD = apply(coef_estimates[,1:4], 2, sd)
)

print(coef_summary)
             Variable        Mean          Min         Max         SD
Intercept   Intercept -22.9949012 -32.49150921 -12.8343327 5.09815958
urban_pct   urban_pct   0.4355998   0.42253188   0.4555461 0.01182212
median_age median_age   0.1675192   0.09480547   0.2344274 0.03915137
average_ed average_ed  17.0309513  15.48655552  18.2937911 0.71052515
summary(gwr.model$SDF$yhat)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.33   31.84   45.53   48.04   55.63  122.84 
# First, let's properly prepare the data
# Convert GWR results to appropriate format
gwr_results <- as.data.frame(gwr.model$SDF)

# Ensure the spatial data is properly formatted
district_gwr.sf.combined <- district_summary_spatial %>%
  cbind(Local_R2 = gwr_results$Local_R2) %>%
  st_as_sf()

# Check the structure
str(district_gwr.sf.combined$Local_R2)
 num [1:139] 0.545 0.545 0.552 0.549 0.54 ...
tmap_mode("plot")
tmap mode set to plotting
# Create the map
tm_shape(district_gwr.sf.combined) +
  tm_fill(col = "Local_R2", 
          style = "pretty",
          palette = "viridis",
          title = "Local R-squared Values") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "GWR Model Performance by District",
            main.title.size = 1,
            frame = FALSE) +
  tm_compass(type = "arrow", position = c("right", "top")) +
  tm_scale_bar(position = c("left", "bottom"))

# Create summary statistics of Local R2 values
summary_stats <- summary(district_gwr.sf.combined$Local_R2)
print(summary_stats)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5314  0.5409  0.5473  0.5468  0.5525  0.5601 

In examining the Geographically Weighted Regression (GWR) model performance across districts in Tanzania, I observed that the model’s effectiveness varies significantly by region. The map uses local R-squared values, ranging from 0.530 to 0.565, to indicate how well the model explains the variability in mobile money usage within each district. Higher R-squared values, seen in the southern and south-eastern regions (in green and yellow), suggest a better model fit, meaning the model explains more of the variance in these areas. In contrast, the northern regions (in dark blue/purple) show lower R-squared values, indicating that the model doesn’t perform as well there. This variation in R-squared values across Tanzania suggests that some of the predictors I’m using, such as education or urbanisation, may be more relevant in certain areas but less effective in others. It could also mean that there are other local factors influencing mobile money adoption in the northern regions that the current model doesn’t capture.

Analysis of the regression results

# Create list of all variables we can analyze
gwr_variables <- c(
  "urban_pct" = "Urban Population %",
  "median_age" = "Median Age",
  "average_ed" = "Average Education Level",
  "male_pct" = "Male Population %",
  "agriculture_pct" = "Agricultural Employment %",
  "bank_pct" = "Bank Usage %",
  "mfi_pct" = "Microfinance Institution Usage %",
  "pension_pct" = "Pension Usage %",
  "insurance_pct" = "Insurance Usage %",
  "sacco_pct" = "SACCO Usage %",
  "cmg_pct" = "Credit Management Group Usage %",
  "invest_pct" = "Formal Investment Usage %",
  "moneylender_pct" = "Informal Moneylender Usage %"
)

plot_gwr_stats <- function(variable_name, model = gwr.model, spatial_data = district_summary_spatial) {
  # Extract variable statistics
  stats <- data.frame(
    district_name = spatial_data$district_name,
    coefficient = as.numeric(unlist(model$SDF[[paste0(variable_name)]])),
    t_value = as.numeric(unlist(model$SDF[[paste0(variable_name, "_TV")]])),
    se_value = as.numeric(unlist(model$SDF[[paste0(variable_name, "_SE")]])))
  
  # Calculate p-values
  stats$p_value <- 2 * pt(abs(stats$t_value), 
                         df = 121,
                         lower.tail = FALSE)
  
  # Join with spatial data
  analysis_sf <- spatial_data %>%
    left_join(stats, by = "district_name") %>%
    st_as_sf()
  
  # Set tmap mode to plot
  tmap_mode("plot")
  
  # Create coefficient map
  coef_map <- tm_shape(analysis_sf) +
    tm_fill(col = "coefficient",
            style = "quantile",
            n = 5,
            palette = "RdBu",
            midpoint = 0,
            title = "Coefficient Values") +
    tm_borders(alpha = 0.5) +
    tm_layout(main.title = paste(gwr_variables[variable_name], "\nCoefficients"),
              main.title.size = 0.8,
              legend.title.size = 0.7,
              legend.text.size = 0.6,
              frame = FALSE)
  
  # Create p-value map
  p_map <- tm_shape(analysis_sf) +
    tm_fill(col = "p_value",
            style = "fixed",
            breaks = c(0, 0.01, 0.05, 0.1, 1),
            palette = "viridis",
            title = "P-values") +
    tm_borders(alpha = 0.5) +
    tm_layout(main.title = paste(gwr_variables[variable_name], "\nP-values"),
              main.title.size = 0.8,
              legend.title.size = 0.7,
              legend.text.size = 0.6,
              frame = FALSE)
  
  # Arrange maps side by side
  combined_maps <- tmap_arrange(coef_map, p_map, ncol = 2)
  
  # Print summaries
  cat("\nSummary Statistics for", gwr_variables[variable_name], "\n")
  cat("\nCoefficients:\n")
  print(summary(stats$coefficient))
  cat("\nStandard Errors:\n")
  print(summary(stats$se_value))
  cat("\nP-values:\n")
  print(summary(stats$p_value))
  
  # Create significance summary with mean coefficients
  significance_counts <- data.frame(
    Significance = c("Highly significant (p < 0.01)",
                    "Significant (0.01 ≤ p < 0.05)",
                    "Marginally significant (0.05 ≤ p < 0.1)",
                    "Not significant (p ≥ 0.1)"),
    Count = c(
      sum(stats$p_value < 0.01),
      sum(stats$p_value >= 0.01 & stats$p_value < 0.05),
      sum(stats$p_value >= 0.05 & stats$p_value < 0.1),
      sum(stats$p_value >= 0.1)
    ),
    Percentage = c(
      mean(stats$p_value < 0.01),
      mean(stats$p_value >= 0.01 & stats$p_value < 0.05),
      mean(stats$p_value >= 0.05 & stats$p_value < 0.1),
      mean(stats$p_value >= 0.1)
    ) * 100,
    Mean_Coefficient = c(
      mean(stats$coefficient[stats$p_value < 0.01]),
      mean(stats$coefficient[stats$p_value >= 0.01 & stats$p_value < 0.05]),
      mean(stats$coefficient[stats$p_value >= 0.05 & stats$p_value < 0.1]),
      mean(stats$coefficient[stats$p_value >= 0.1])
    )
  )
  
  cat("\nSignificance Summary:\n")
  print(significance_counts)
  
  # Print additional interpretation
  cat("\nInterpretation:\n")
  cat("Coefficient range:", round(min(stats$coefficient), 3), "to", round(max(stats$coefficient), 3), "\n")
  cat("Mean coefficient:", round(mean(stats$coefficient), 3), "\n")
  cat("Percentage of significant coefficients (p < 0.05):", 
      round(mean(stats$p_value < 0.05) * 100, 1), "%\n")
  
  # Return maps and data
  return(list(maps = combined_maps))
}

Examining how urbanity affects use of mobile money

plot_gwr_stats("urban_pct")
tmap mode set to plotting

Summary Statistics for Urban Population % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4225  0.4241  0.4317  0.4356  0.4469  0.4555 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1147  0.1149  0.1150  0.1150  0.1151  0.1152 

P-values:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0001279 0.0001676 0.0002715 0.0002520 0.0003376 0.0003546 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)   139        100        0.4355998
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)     0          0              NaN

Interpretation:
Coefficient range: 0.423 to 0.456 
Mean coefficient: 0.436 
Percentage of significant coefficients (p < 0.05): 100 %
$maps

The relationship between urban population percentage and mobile money usage in Tanzania shows distinct geographical patterns. The coefficient map on the left indicates that values range from 0.423 to 0.456, with the darkest regions in the south and southeast showing the strongest positive impact of urban population percentage on mobile money usage. In contrast, the lighter northern regions have lower coefficients, suggesting a weaker association between urbanisation and mobile money adoption. The p-values map on the right confirms that this relationship is statistically significant across nearly all regions, with most areas displaying very low p-values (0.00 to 0.01). This consistent significance indicates that, despite regional differences, urban population percentage remains a key predictor of mobile money usage throughout Tanzania.

A closer examination of geographical features provides insight into these spatial patterns. The Southern Highlands—notably the agricultural districts of Mbeya, Iringa, and Njombe—depend heavily on farming, with predominantly rural populations. In these areas, the positive relationship between urban population percentage and mobile money usage likely reflects the limited financial infrastructure in rural zones. As urban centres develop in these highland regions, they become focal points for financial services, promoting mobile money adoption among rural residents seeking alternatives to traditional banking.

Similarly, in the Lake Victoria basin in the northwest—covering regions like Mwanza, Mara, and Kagera—the economy is largely agricultural. Here, the association between urbanisation and mobile money usage is weaker, as rural populations often rely on informal financial systems and may have limited exposure to mobile financial services. This reliance on agriculture and high rural population density results in lower coefficients, reflecting limited influence of urbanisation on mobile money adoption in these areas.

Conversely, coastal regions such as Dar es Salaam and Zanzibar demonstrate a stronger positive relationship. As major economic and trade hubs, these coastal areas are highly urbanised and equipped with well-developed infrastructure for financial services and mobile connectivity. Urbanisation here enhances accessibility to mobile money platforms, with residents and businesses readily adopting mobile financial services. This dense service network along the coast contributes to higher coefficients in the south and southeast, where urbanisation plays a substantial role in expanding financial inclusion.

In summary, while urban population percentage is a significant predictor of mobile money usage across Tanzania, the impact varies by region due to specific geographical and economic characteristics. Agricultural areas in the Southern Highlands and Lake Victoria basin, dominated by rural populations, exhibit weaker correlations, likely due to limited financial infrastructure and reliance on traditional financial practices. In contrast, coastal economic hubs like Dar es Salaam and Zanzibar show a stronger positive association between urbanisation and mobile money usage. These findings highlight the importance of region-specific strategies for promoting financial inclusion, with a focus on improving infrastructure in agricultural regions and leveraging established urbanisation in tourism and coastal areas to maximise mobile money adoption.

Examining how age affects use of mobile money

plot_gwr_stats("median_age")
tmap mode set to plotting

Summary Statistics for Median Age 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.09481 0.13417 0.16417 0.16752 0.20470 0.23443 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4882  0.4891  0.4896  0.4895  0.4898  0.4912 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.6338  0.6767  0.7380  0.7336  0.7844  0.8471 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100        0.1675192

Interpretation:
Coefficient range: 0.095 to 0.234 
Mean coefficient: 0.168 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

In examining the relationship between median age and mobile money usage across Tanzania, I noticed that the coefficients and p-values highlight some clear regional patterns. The coefficient map on the left shows values ranging from 0.095 to 0.234, with darker areas, particularly in the central and northeastern regions, showing higher coefficients. This suggests that median age has a stronger positive impact on mobile money usage in these areas, meaning that as the median age increases, mobile money usage is expected to increase more substantially. In contrast, the lighter areas, mainly in the western and southern regions, have lower coefficients, indicating a weaker relationship between median age and mobile money usage there. The p-values map on the right, however, shows that almost all regions have relatively high p-values (between 0.10 and 1.00), represented in yellow, suggesting that the relationship between median age and mobile money usage is not statistically significant across most of Tanzania. This indicates that while there might be some association between age and mobile money usage, it is generally weak and unreliable in this dataset.

Examining how education affects use of mobile money

plot_gwr_stats("average_ed")
tmap mode set to plotting

Summary Statistics for Average Education Level 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  15.49   16.61   17.02   17.03   17.54   18.29 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  6.938   6.948   6.958   6.957   6.964   6.984 

P-values:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.009791 0.012979 0.015665 0.016315 0.018415 0.028370 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     3   2.158273         18.27427
2           Significant (0.01 ≤ p < 0.05)   136  97.841727         17.00353
3 Marginally significant (0.05 ≤ p < 0.1)     0   0.000000              NaN
4               Not significant (p ≥ 0.1)     0   0.000000              NaN

Interpretation:
Coefficient range: 15.487 to 18.294 
Mean coefficient: 17.031 
Percentage of significant coefficients (p < 0.05): 100 %
$maps

In examining the relationship between average education level and mobile money usage across Tanzania, notable geographic patterns emerge in both the coefficient values and the p-values. The coefficient map on the left shows a range from 15.49 to 18.29, with darker regions in the north and northeast displaying higher coefficients. This suggests that in these areas, average education level has a stronger positive effect on mobile money usage; as education levels increase, mobile money adoption is expected to rise more significantly. These northern regions, including Arusha, Kilimanjaro, and parts of Tanga, have relatively high education levels compared to other areas. This stronger correlation may be influenced by the presence of educational institutions and greater economic opportunities in these regions, which create a favourable environment for the adoption of mobile financial services. Additionally, these regions are close to major tourist areas like the Serengeti and Ngorongoro, where the influence of tourism and a higher influx of educated individuals may further contribute to this positive effect.

In contrast, the lighter-shaded regions in the south and west, including areas like Rukwa, Katavi, and parts of the Southern Highlands, show lower coefficients, indicating a weaker relationship between education and mobile money usage. These areas are generally more rural and agricultural, with lower average education levels and limited access to financial infrastructure. In these regions, even where education levels increase, the impact on mobile money adoption appears less substantial, possibly due to economic activities that are less reliant on formal financial systems or due to limited exposure to mobile financial services.

The p-values map on the right further supports these findings, as most regions display low p-values (between 0.00 and 0.05), except for a small area in the far northeast. This indicates that the relationship between education and mobile money usage is statistically significant in the majority of Tanzania. The significance across regions underscores the role of education as a consistent predictor of mobile money usage, though its strength varies depending on regional characteristics.

Regional Analysis and Implications

In the northern and northeastern regions, where education appears to have a stronger impact, the connection between higher education levels and mobile money usage may reflect a more developed financial ecosystem and greater economic diversity. Urban centres in this area, such as Arusha and Moshi (near Kilimanjaro), likely provide residents with more access to financial services, fostering a positive environment for mobile money adoption among educated populations. This suggests that policies encouraging education in these areas could further boost financial inclusion, as residents are already predisposed to adopt mobile financial services.

In contrast, the southern and western regions are more agrarian, with economic activities focused on farming and limited urbanisation. Here, the lower coefficients suggest that increasing education alone may not be sufficient to significantly boost mobile money usage without concurrent investments in infrastructure and financial access. Regions like the Southern Highlands (Mbeya, Rukwa) and the Lake Tanganyika area are marked by lower population densities and limited telecommunications infrastructure, which may explain why the relationship between education and mobile money adoption is weaker. Efforts to improve financial inclusion in these areas may need to focus not only on education but also on expanding infrastructure and financial literacy programs tailored to rural communities.

In summary, this analysis highlights that while average education level is a meaningful predictor of mobile money usage across Tanzania, the strength of its impact is geographically variable. Regions in the north and northeast, which are more economically diverse and urbanised, show a stronger positive relationship, reflecting how education can enhance financial inclusion when combined with accessible financial services and infrastructure. Meanwhile, the weaker association in the rural south and west suggests that improving mobile money adoption in these areas will require multi-faceted strategies that go beyond education to address underlying infrastructure and economic challenges. Recognising these regional differences allows for targeted interventions that align with the specific socio-economic and geographical needs of each area, ultimately advancing financial inclusion in a way that reflects Tanzania’s diverse landscape.

Examining how gender affects use of mobile money

plot_gwr_stats("male_pct")
tmap mode set to plotting

Summary Statistics for Male Population % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.02120 0.05800 0.09042 0.08424 0.10706 0.14399 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2727  0.2731  0.2732  0.2733  0.2734  0.2746 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.6009  0.6960  0.7413  0.7600  0.8320  0.9385 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100       0.08424209

Interpretation:
Coefficient range: 0.021 to 0.144 
Mean coefficient: 0.084 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

In exploring the influence of gender on mobile money usage across Tanzania, I see that the relationship between gender percentage and mobile money usage is not statistically significant across most of Tanzania. This lack of significance means that while gender might seem to have a positive impact, this relationship is not strong enough to be considered reliable in this dataset.

Examining how agriculture activity affects use of mobile money

plot_gwr_stats("agriculture_pct")
tmap mode set to plotting

Summary Statistics for Agricultural Employment % 

Coefficients:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.051609 -0.039184 -0.025533 -0.024973 -0.009654 -0.002237 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1595  0.1599  0.1600  0.1600  0.1601  0.1606 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.7476  0.8068  0.8733  0.8768  0.9520  0.9889 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100      -0.02497346

Interpretation:
Coefficient range: -0.052 to -0.002 
Mean coefficient: -0.025 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

In exploring the influence of agricultural employment percentage on mobile money usage across Tanzania, I see that the relationship between agricultural employment percentage and mobile money usage is not statistically significant across most of Tanzania. This lack of significance means that while agricultural employment might seem to have a negative impact, this relationship is not strong enough to be considered reliable in this dataset.

Examining how banking use affects use of mobile money

plot_gwr_stats("bank_pct")
tmap mode set to plotting

Summary Statistics for Bank Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1997  0.2145  0.2463  0.2397  0.2611  0.2696 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2649  0.2655  0.2666  0.2663  0.2672  0.2674 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3146  0.3306  0.3559  0.3719  0.4206  0.4536 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100        0.2396519

Interpretation:
Coefficient range: 0.2 to 0.27 
Mean coefficient: 0.24 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show banking use effect is statistically insignificant.

Examining how usage of microfinance institution affects use of mobile money

plot_gwr_stats("mfi_pct")
tmap mode set to plotting

Summary Statistics for Microfinance Institution Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2268  0.2278  0.2336  0.2439  0.2611  0.2776 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5162  0.5170  0.5174  0.5175  0.5181  0.5196 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.5926  0.6146  0.6523  0.6385  0.6604  0.6619 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100         0.243943

Interpretation:
Coefficient range: 0.227 to 0.278 
Mean coefficient: 0.244 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show MFI usage effect is statistically insignificant.

Examining how pension usage affects use of mobile money

plot_gwr_stats("pension_pct")
tmap mode set to plotting

Summary Statistics for Pension Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -2.204  -2.032  -1.783  -1.708  -1.381  -1.068 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.788   2.792   2.814   2.806   2.817   2.820 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4358  0.4719  0.5264  0.5474  0.6214  0.7026 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100        -1.707537

Interpretation:
Coefficient range: -2.204 to -1.068 
Mean coefficient: -1.708 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show pension usage effect is statistically insignificant.

Examining how usage of insurance affects use of mobile money

plot_gwr_stats("insurance_pct")
tmap mode set to plotting

Summary Statistics for Insurance Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4100  0.4232  0.4583  0.4552  0.4844  0.4918 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3936  0.3938  0.3941  0.3943  0.3947  0.3959 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2143  0.2214  0.2486  0.2519  0.2848  0.3001 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100        0.4551736

Interpretation:
Coefficient range: 0.41 to 0.492 
Mean coefficient: 0.455 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show insurance effect is statistically insignificant.

Examining how SACCO usage affects use of mobile money

plot_gwr_stats("sacco_pct")
tmap mode set to plotting

Summary Statistics for SACCO Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -2.113  -2.030  -1.987  -1.961  -1.895  -1.741 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.009   1.010   1.011   1.011   1.012   1.017 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.03961 0.04690 0.05154 0.05590 0.06304 0.08932 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0    0.00000              NaN
2           Significant (0.01 ≤ p < 0.05)    61   43.88489        -2.043493
3 Marginally significant (0.05 ≤ p < 0.1)    78   56.11511        -1.897046
4               Not significant (p ≥ 0.1)     0    0.00000              NaN

Interpretation:
Coefficient range: -2.113 to -1.741 
Mean coefficient: -1.961 
Percentage of significant coefficients (p < 0.05): 43.9 %
$maps

In examining the relationship between SACCO (Savings and Credit Cooperative Organization) usage percentage and mobile money usage across Tanzania, distinct geographical patterns emerge in both the coefficient and p-value maps. The coefficient map on the left shows values ranging from -2.113 to -1.741, all negative, with the darkest shades in the northern regions indicating the strongest negative coefficients. This suggests that in these areas, an increase in SACCO usage percentage is associated with a significant decrease in mobile money usage, potentially reflecting a preference for SACCOs over mobile financial services. Regions like Kilimanjaro, Arusha, and parts of Mara in the north may have deeply rooted SACCO networks that serve as primary financial institutions, reducing the need for mobile money options. SACCOs in these areas likely provide accessible and trusted financial services, and residents may view them as a stable alternative to newer mobile financial solutions, particularly where SACCOs have historically established strong ties within communities.

In contrast, the lighter regions in the southern areas of Tanzania, including districts in Mbeya and Ruvuma, display smaller negative coefficients, indicating a weaker inverse relationship between SACCO usage and mobile money adoption. These southern regions, though they may have SACCOs, do not exhibit the same level of negative association, possibly because mobile money services are more widely accepted or integrated with SACCOs, or because SACCO presence is less dominant. This could reflect a more flexible financial ecosystem in the south, where mobile money services and SACCOs coexist without significant competition for users.

The p-value map on the right shows that most regions have low p-values, with values between 0.00 and 0.10, represented by dark blue and green shading, indicating that the relationship between SACCO usage and mobile money usage is statistically significant across much of Tanzania. The consistent significance of this relationship, combined with the negative coefficients, suggests that SACCO usage plays a substantial role in influencing mobile money adoption. In regions with a strong SACCO presence, these cooperatives may fulfil financial needs that mobile money platforms otherwise would, thereby limiting mobile money’s role.

Regional Implications and Analysis

In the northern east regions, where SACCO usage has a stronger negative impact on mobile money adoption, the preference for SACCOs may be shaped by socio-cultural factors and the structure of the local economy. These regions often have stronger communal and cooperative financial practices, where SACCOs are community-driven and cater to collective needs, making them highly trusted institutions. Additionally, SACCOs offer specific financial services, such as loans and savings, that might be seen as more comprehensive compared to mobile money, which primarily focuses on payments and transfers. In these areas, policies to promote mobile money may need to address this preference by exploring potential partnerships between mobile money providers and SACCOs or by expanding the financial services available through mobile platforms to make them more competitive.

In the southern regions, where the inverse relationship is weaker, mobile money adoption might coexist more readily with SACCO services, suggesting that SACCOs do not dominate the financial landscape to the same extent as in the north. This could be due to a more diversified financial ecosystem, where users feel comfortable accessing both SACCO services and mobile money options. Here, policies to promote mobile money may be more straightforward, focusing on improving accessibility and enhancing user awareness without facing the same level of competition from SACCOs.

Overall, these findings highlight that SACCO usage acts as a significant factor in limiting mobile money adoption in Tanzania, with stronger effects in the north where SACCOs are deeply embedded in the financial culture. The need for tailored approaches becomes clear: in SACCO-dominant regions, efforts to increase mobile money adoption might focus on building alliances with SACCOs or providing similar financial services through mobile platforms. In regions where the SACCO influence is weaker, mobile money services could expand through straightforward strategies like awareness campaigns and infrastructure investments. Recognising these regional differences allows for interventions that respect local financial preferences while still advancing the broader goal of financial inclusion across Tanzania.Examining how Credit Management Group (CMG) usage affects use of mobile money

plot_gwr_stats("cmg_pct")
tmap mode set to plotting

Summary Statistics for Credit Management Group Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.3644 -0.3445 -0.3113 -0.3161 -0.2864 -0.2784 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2283  0.2289  0.2294  0.2293  0.2298  0.2303 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1157  0.1365  0.1760  0.1742  0.2133  0.2267 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100       -0.3160863

Interpretation:
Coefficient range: -0.364 to -0.278 
Mean coefficient: -0.316 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show Credit Management Group (CMG) usage is statistically insignificant.

Examining how formal investment usage affects use of mobile money

plot_gwr_stats("invest_pct")
tmap mode set to plotting

Summary Statistics for Formal Investment Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9539  1.2109  1.5399  1.4734  1.7335  1.8925 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.756   2.760   2.782   2.774   2.785   2.788 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4983  0.5350  0.5786  0.5985  0.6613  0.7300 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100          1.47335

Interpretation:
Coefficient range: 0.954 to 1.893 
Mean coefficient: 1.473 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show formal investment is statistically insignificant.

Examining how informal moneylender usage affects use of mobile money

plot_gwr_stats("moneylender_pct")
tmap mode set to plotting

Summary Statistics for Informal Moneylender Usage % 

Coefficients:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.4630 -0.4198 -0.3463 -0.3547 -0.2935 -0.2468 

Standard Errors:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4636  0.4644  0.4646  0.4649  0.4655  0.4669 

P-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3224  0.3691  0.4575  0.4517  0.5285  0.5976 

Significance Summary:
                             Significance Count Percentage Mean_Coefficient
1           Highly significant (p < 0.01)     0          0              NaN
2           Significant (0.01 ≤ p < 0.05)     0          0              NaN
3 Marginally significant (0.05 ≤ p < 0.1)     0          0              NaN
4               Not significant (p ≥ 0.1)   139        100       -0.3547404

Interpretation:
Coefficient range: -0.463 to -0.247 
Mean coefficient: -0.355 
Percentage of significant coefficients (p < 0.05): 0 %
$maps

High p-values show use of informal money lenders is statistically insignificant.

Concluding points

Building upon the initial analysis of the geography of financial inclusion in Tanzania, the Geographically Weighted Regression (GWR) model provides nuanced district-level insights into the factors influencing access to financial services. The GWR results reveal significant spatial variability, with local R-squared values ranging from 0.530 to 0.565. This variation indicates that the model’s explanatory power differs across regions, being more effective in the southern and southeastern districts compared to the northern areas. Such discrepancies suggest that predictors like education and urbanization have varying levels of influence in different geographic contexts, emphasizing the importance of spatial analysis in understanding financial inclusion.

Geospatial analysis of the coefficients and significance levels highlights that urban population percentage and average education level are significant positive predictors of mobile money usage in most regions. However, their impact is not uniform across the country. The southern and southeastern regions, where the model fits better, show stronger positive relationships. This suggests that urbanization and education have a more pronounced effect on mobile money adoption in these areas, possibly due to better infrastructure and higher concentrations of services. In contrast, the northern regions exhibit lower coefficients and less statistical significance, indicating that other local factors might be influencing mobile money usage there.

Conversely, the significant negative relationship between SACCO (Savings and Credit Cooperative Organization) usage percentage and mobile money usage is also spatially variable. The negative impact is more pronounced and statistically significant in certain northern districts, suggesting that traditional financial institutions like SACCOs may be more deeply rooted in these areas. This could imply competition between SACCOs and mobile money platforms, affecting the adoption rates of the latter. The geospatial distribution of this relationship highlights the need to consider local financial ecosystems when promoting mobile money services.

The geospatial patterns observed in the GWR analysis underscore the importance of considering spatial heterogeneity when addressing financial inclusion. The varying influence of different predictors across regions indicates that a one-size-fits-all approach may not be effective. Region-specific strategies are necessary to address the unique challenges and leverage the strengths of each area. For instance, enhancing urban infrastructure and educational opportunities in regions where these factors significantly boost mobile money usage could be prioritized. In areas where SACCO usage negatively impacts mobile money adoption, integrating mobile money services with existing SACCO operations or promoting awareness of the benefits of mobile money could mitigate this effect.

Understanding Tanzania’s diverse geographical landscape—including its complex physical features, economic zones, and varying levels of infrastructure development—is crucial for interpreting the spatial patterns observed in the GWR analysis. The disparities in model performance and predictor significance are deeply intertwined with the country’s varied economic activities, population distribution, and accessibility to services. For instance, the agricultural regions in the Southern Highlands and Lake Victoria basin, where reliance on farming and a predominantly rural population prevail, may exhibit lower mobile money usage due to limited financial infrastructure and lower education levels. Conversely, areas rich in tourism—such as the Northern Circuit with the Serengeti and Mount Kilimanjaro, and coastal regions like Dar es Salaam and Zanzibar—benefit from better infrastructure, higher economic activity, and greater technological adoption, leading to increased mobile money usage. Infrastructure development varies significantly, with urban centers enjoying advanced facilities that promote financial inclusion, while rural and remote areas lag due to challenging terrains and sparse populations. These factors influence the spatial dynamics of mobile money adoption, as reflected in the varying significance of predictors like urbanization and education across different regions. Recognizing these geospatial nuances allows for a more accurate analysis that respects the local context, emphasizing the need for region-specific strategies—such as tailored financial services for agricultural communities or leveraging the existing infrastructure in tourism hubs—rather than imposing assumptions based on experiences from more urbanized countries like Singapore.

In conclusion, the GWR analysis not only identifies urbanization and education as key drivers of mobile money usage but also highlights how their effects vary geographically. The spatial variability in both model performance and predictor influence emphasizes the need for geospatially informed policymaking. Tailoring interventions to the specific needs and characteristics of each region can enhance the effectiveness of efforts to promote financial inclusion. By leveraging geospatial analysis, policymakers and stakeholders can develop targeted strategies that address the unique spatial dynamics influencing mobile money adoption across Tanzania. This geospatial approach is essential for overcoming geographic barriers, supporting underserved regions, and ultimately contributing to the country’s socio-economic development goals.